Test-Driven Development of Complex Information Extraction Systems using TextMarker
نویسندگان
چکیده
Information extraction is concerned with the location of specific items in textual documents. Common process models for this task use ad-hoc testing methods against a gold standard. This paper presents an approach for the testdriven development of complex information extraction systems. We propose a process model for test-driven information extraction, and discuss its implementation using the rule-based scripting language TEXTMARKER in detail. TEXTMARKER and the test-driven approach are demonstrated by two real-world case studies in technical and medical domains.
منابع مشابه
TextMarker: A Tool for Rule-Based Information Extraction
This paper presents TEXTMARKER– a powerful toolkit for rule-based information extraction. TEXTMARKER is based on UIMA and provides versatile information processing and advanced extraction techniques. We thoroughly describe the system and its capabilities for human-like information processing and rapid prototyping of information extraction applications.
متن کاملRule-Based Information Extraction for Structured Data Acquisition using TextMarker
Information extraction is concerned with the location of specific items in (unstructured) textual documents, e.g., being applied for the acquisition of structured data. Then, the acquired data can be applied for mining methods requiring structured input data, in contrast to other text mining methods that utilize a bag-of-words approach. This paper presents a semi-automatic approach for structur...
متن کاملA Framework for Semi-Automatic Development of Rule-based Information Extraction Applications
For the successful processing and handling of (large scale) document collections, effective information extraction methods are essential. This paper presents a framework for the semiautomatic development of rule-based information extraction applications based on the TEXTMARKER language utilizing machine learning methods. We describe the approach in detail and present the TEXTRULER system as an ...
متن کاملA review on EEG based brain computer interface systems feature extraction methods
The brain – computer interface (BCI) provides a communicational channel between human and machine. Most of these systems are based on brain activities. Brain Computer-Interfacing is a methodology that provides a way for communication with the outside environment using the brain thoughts. The success of this methodology depends on the selection of methods to process the brain signals in each pha...
متن کاملA review on EEG based brain computer interface systems feature extraction methods
The brain – computer interface (BCI) provides a communicational channel between human and machine. Most of these systems are based on brain activities. Brain Computer-Interfacing is a methodology that provides a way for communication with the outside environment using the brain thoughts. The success of this methodology depends on the selection of methods to process the brain signals in each pha...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008